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Abstract. Let K(m) denote the smallest number with the property that every m-state finite automaton 
can be built as a neural net using K(m) or fewer neurons. A counting argument shows that K() is at 
least 0((m log m)'/3), and a construction shows that K(m) is at most O(m?/⁄*). The counting 
argument and the construction allow neural nets with arbitrarily complex local structure and thus may 
require neurons that themselves amount to complicated networks. Mild, and in practical situations 
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1. Introduction 


It has been known since 1954 (see Minsky [9, 10]) that any (deterministic) finite 
automaton can be realized or simulated by a neural net of the original type 
specified by McCullough and Pitts [7]. The simulation is quite simple: A finite 
automaton with m states is replaced by a network containing 2m + | neurons. 
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However, if one considers that a network containing 7 neurons is capable of 
2” states, one might wonder whether a more efficient simulation is possible. In 
particular, it might be expected that an arbitrary m-state automaton could be 
simulated by a neural net having O(log m) neurons. This turns out to be very 
far from the truth. Counting arguments developed here show that in general one 
cannot expect to use fewer neurons than some low-order fractional power of m, 
along with a logarithmic factor. Depending on what restrictions are placed on 
the network, one might do no better than O((m log m)'/3), With mild restric- 
tions the exponent changes to +. Finally, applying the kind of restrictions that 
would typify a realistic hardware implementation of a network, we find 
ourselves back in linear territory. 

The final result is somewhat disappointing since it means that parallel 
hardware inspired by this somewhat primitive neural model is unlikely to host 
arbitrary finite automata with uniform high efficiency: Some automata will 
require networks that require about as many neurons as the automata have 
states! On the other hand, certain automata may be simulable by networks 
having a logarithmic number of neurons. A careful reading of this paper may 
produce the impression that more complicated neural network models will 
suffer from the same inherent inefficiency. We see nothing to contradict this 
impression at the present time, see also the conclusions section in this paper. As 
one of the referees put it: ‘‘The paper gives the impression of establishing the 
disturbing result that there may have to be as many neurons in our head as there 
are possible states to our mind’’. Although it is totally unclear how to define the 
**state’’ of our mind, and also while the neurons in our brains may very well be 
more complicated than the ‘‘neurons’’ in this paper, we can not help but agree 
that this is the kind of conclusion our findings lead up to. 

The finite state machines studied in this paper are of the type called ‘‘Mealy 
Machines,’’ see Mealy [8] or Hopcroft and Ullman [4]. Section 3 contains a 
self-contained brief discussion of Mealy machines. In order to prevent confu- 
sion with the closely related ‘‘Neural Nets,’’ as currently defined in Cognitive 
Psychology and Artificial Intelligence (see Rumelhart and McClelland [11]), we 
have renamed Neural Nets and Neurons (as used in this paper) ‘‘Threshold 
Machines” and ‘‘Threshold Cells,” respectively. Definitions will be given in 
Section 4. It will be seen that classical AND, OR, and NOT gates are special 
cases of the threshold cells used in this paper, so that the negative results in this 
paper carry over to machines based on such classical gates. It is not clear 
whether this is also true for the positive (constructive) results. 

The problem addressed thus becomes: How many threshold cells may be 
needed to build a threshold machine which acts ‘‘exactly like’? a prespecified 
m-state machine. 

The exact formulation of this question of course depends on the definitions as 
given in the Sections 3 and 4 and on a convention, to be given in Section 3, for 
when two machines are said to be acting the same. A more intuitive description 
might be as follows: 

At each point in time ¢ (¢ = 0, 1,...: time is discrete) each of the threshold 
cells or neurons of the threshold machine is in one of two states: either it does, 
or it does not, fire. If the machine has K cells then, since each of the cells is in 
one of two states, the whole machine has 2% possible states. Hence, every 
K-cell threshold machine is a finite state machine with 2^ states. 


Efficient Simulation of Finite Automata by Neural Nets 497 


On the other hand, every m-state Mealy machine can be built as a threshold 
machine (see Minsky [9, 10] or Section 5 in this paper). Clearly, there exist 
m-state machines which can be built using only log m (take m = 2*) threshold 
cells. Would this be almost true for every m-state machine? Unfortunately, the 
answer is negative. The actual number of threshold cells needed to build a 
specific machine depends on how complicated individual cells are allowed to be 
(this is expressed in the size of their weight alphabet, their threshold alphabet, 
and their Fan-in and Fan-out, see Section 4). With more complicated cells, the 
same machine can be built with fewer cells. Typical results are: 


THEOREM 1.1. (see Dewdney [1]). There exists aC, > 0 such that as long 
as there are no restrictions on how complicated individual cells can be, 
every m-state Mealy Machine can be built with 


at most C, + m°^ threshold cells. (1.1) 


THEOREM 1.2. There exists a C, >0 such that even if there are no 
restrictions on how complicated individual cells can be, for every suffi- 
ciently large m there exists an m-state Mealy Machine which in order to be 
built needs 


at least C, : (mlog m)'”” threshold cells. (1.2) 


The lower bound in (1.2) becomes higher when restrictions are placed on 
how complicated the individual cells can be. 

Even with only mild restrictions on those cells, the number of cells needed to 
build a m-state machine can become linear in m. 

For example: 


Tueorem 1.3. Zf either there is a limit on the Fan-in of individual cells, 
or there are simultaneous limits on the Fan-out of cells, the weight 
alphabet, and the threshold alphabet, then there exists a C, > 0 such that 
for every sufficiently large m there exists an m-state Mealy Machine which 
in order to be built as a threshold machine satisfying those restrictions 
needs 


at least C, > m threshold cells. (1.3) 


The following result shows that at least for the second set of conditions 
Theorem 1.3 is about as sharp as possible. 


TueoreM 1.4. Every m-state Mealy machine can be built as a threshold 
machine that contains only 2m + 1 threshold cells. This threshold machine 
has a fixed weight alphabet {—1, 0, 1} and a fixed threshold alphabet 
{1, 2} (both independent of m). This threshold machine also has the 
property that all cells have a Fan-out of either 2 or 3. 


The proof of Theorem 1.4 is by construction. The construction in Section 5 is 
essentially the construction in Minsky [9], modified because we have binary 
input. The threshold machine constructed in the proof need not have bounded 
fan-in for its cells. This means that we can not exclude the possibility that a 
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restriction on the fan-in of cells leads to m-state machines which in order to be 
built need a superlinear (in m) number of cells. 

Section 2 describes the idea of the proof of Theorems 1.2 and 1.3. Section 3 
discusses Mealy Machines and Section 4 discusses threshold machines and 
proves Theorems 1.2 and 1.3. Theorems 1.1 and 1.4 are known results (see 
Dewdney [1]). Section 5 gives an outline of the proofs of these theorems. The 
proofs are based on constructions. Not surprisingly, the construction in the 
proof of Theorem 1.1 uses weight alphabets and threshold alphabets that depend 
on m. 


2. The Lower Bound 


Definition 2.1. For each natural number m. K(m) is the smallest number 
such that every Mealy Machine with m or fewer states can be built as a 
threshold machine using K(m) or fewer threshold cells. Theorems 1.1 and 1.2 
say that 


C,(m log m)? < K(m) < Cm. (2.1) 


The proof of the lower bounds in (2.1) works by counting the number of 
“‘really different’ m-state Mealy Machines and the number of ‘‘really differ- 
ent? Mealy Machines that can be built as a threshold machine using K or 
fewer cells. We still need to define when two Mealy Machines are ‘‘really 
different’’ (the definition actually used will be given in Section 3), but for any 
reasonable definition we trivially have: 

Let L(m) be the number of ‘‘really different’? Mealy Machines with m or 
fewer states. Let U( K) be the number of different Mealy Machines that can be 
built as a threshold machine using K or fewer cells. Then: 


U(K(m)) = L(m). (2.2) 


In Section 3, we define when machines are ‘‘really different’? and we derive a 
lower bound for L(m). In Section 4, we derive an upper bound for U( K), and 
we combine the lower and upper bounds to obtain bounds for K( 7m). In Section 
4, we also obtain alternative bounds for U( K), with additional restrictions on 
what kind of threshold machines are allowed, and use these to obtain stronger 
lower bounds for K(m), based of course on stronger requirements. 


3. Mealy Machines 
An m-state, binary Mealy Machine is a deterministic finite automaton which at 
each point in time ¢ (f =0,1,...: time is discrete) is in a state S(fheE 
{1,2,..., m} ({1,2,..., m} is the state-space) and which at each point in 
time f receives an input X(t)e{0, 1} ({0, 1} is the input alphabet) and 
generates an output O(t) € {0, 1} ({0, 1} is the output alphabet). 

The output O(t) depends on the state S(f) and the input /(f) through maps 


g: {1,2,..., m} > {0,1} (Oe {0, 1}), (3.1) 


where 
O(t) = &xo(S(4))- (3.2) 
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FIGURE 1 
A. 


The new state S(t + 1) depends on the old state S(ż) and the input Z(t) through 
the maps 


f: {1,2,..., m} > {1,2,..., M} (ie {0, 1}), (3.3) 

where 
S(t + 1) =fyy(S(t)). (3.4) 
A Mealy Machine is entirely determined by its state space {1,2,..., m}, the 


input alphabet, the output alphabet, and the maps f,, g;. Throughout this paper, 
both input alphabet and output alphabet always are {0, 1} (‘‘one bit input, one 
bit output’’). 

The input alphabet can of course be replaced by any other two symbols, for 
example {solid, dotted}, and in the same way the output alphabet can be 
replaced by {bell, whistle}. Thus, the m-state Mealy machine can be thought of 
as a directed graph with m nodes, where every node has two outgoing arcs: one 
‘solid’ and one ‘‘dotted,’? and where each of the 2m arcs is labeled 
with either a ‘“‘bell’’ or a ‘‘whistle.’’ An example, with m = 3, is shown in 
Figure 1. 

If, at time ¢, the machine is in state s and the input is ‘‘solid,’’ then the 
machine moves along the solid outgoing arc from state s to the new state, and 
the output is either a bell or a whistle, depending on the label of that arc. 

A very interesting problem is how to decide whether two Mealy Machines are 
the same or not. For example, we could say that two Mealy machines, say 
machine 1 with m™, f®, g and machine 2 with m®, fO, g® are “the 
same’’ only if 


m® = m”? (3.5) 
and 
FOE(s) = f(s),  &P(s) = 8P'(s), (3.6) 


both for all s and all 7. 
This very simple definition leads to the obvious conclusion that there are 


exactly (2m)°” different m-state machines. (3.7) 


This definition clearly is not acceptable. For example, a relabeling of the m 
states certainly does not produce a different machine, while with high probabil- 
ity it changes some f,(s) or g,(s). According to this train of thought, 
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(2m)°™-(m!)~' is an appealing approximation for the number of different 
m-state machines. For a more rigorous treatment, we ask ourselves the 
question: When is machine 2 an implementation of machine 1? In other words, 
when is it true that any copy of machine 1 can be replaced by a copy of machine 
2? 


Definition 3.1. Machine 2, with m®, f”, g”, is an implementation of 
machine 1 with m®, f, g whenever there exists a map (called the 
canonical map) Can: {1,2,...,m@} > {1,2,..., m®} with the property 
that whenever machine | starts in any state s €{1,2,..., m®} (S(O) = 
s) and machine 2 starts in state s? = Can (s“) (S(0) = s? = Can(s)) 
then for every common input sequence (/(f))7_9, as long as both machines 
receive that input sequence (J°(t) = I(t) = I(t) for all t) both machines 
generate the same output sequence (O"(t) = O(t) for all £). (It is some- 
times preferable to define Can as a point-to-set map that assigns to each 
se{1,2,...,m™} a nonempty subset Can(s) of {1,2,..., m@}. This 
does not change anything in any essential way.) 

This paper does not need a solution to the problem of how to determine, for a 
given pair of machines, whether one is an implementation of the other, and of 
how to find the canonical map. However, it is easy to find the following 
characterization: 

Machine 2 is an implementation of machine 1 if and only if there exists a 
point-to-set map Can that assigns to each s‘? €{1,2,..., mP} a nonempty 
subset Can(s‘”) of {1,2,....m©} and that satisfies 


gs = p(s?) for all se Can(s), 


and 


We eet Cc Can f®(s™)). 


sMeCan(s) 


The requirement in the following definition is probably slightly stronger than 
necessary. It is chosen to facilitate the argument in Section 4. It formalizes the 
notion of ‘‘really different’’ used in Section 2: 


Definition 3.2. Two machines, say with mY’, fY, gi? (j= 1, 2) are 
‘‘divergent’’ if for every pair (s, s”), s”e{1,2,...,m}, there exists a 
common input sequence /(f) such that if machine j starts in state s™ 


(S‘?(0) = s“) and both machines get input sequence (1(f))°_), then for some 
t>0 


O(t) + O%(t). (3.8) 


Suppose we have three machines, with m, f®, g“. Suppose that the 
machines 1 and 2 are ‘‘divergent’’ (in the sense of Definition 3.2) and that 
machine 3 is an implementation of both machines 1 and 2. We thus have two 
canonical maps Can; {1,2,...,m} > {1,2,...,m} (j=1, 2). 
Clearly, Can,({1,2,....m}) is a subset of {1,2,...,m©}. Equally 
clearly, the fact that the machines 1 and 2 are divergent means that these two 
subsets of {1,2,..., m®} are disjoint. 
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FIGURE 2 


By repeating this argument we prove: 


Lemma 3.1. Suppose there exists a set of U Mealy Machines that are 
pairwise divergent, and suppose there exists an m-state machine which is 
an implementation of all those U machines. Then 


U<m. (3.9) 


We are now ready to proceed to the main topic of this section and obtain two 
lower bounds for the number of (pairwise) divergent m-state Mealy Machines 
that can be found. 

The first lower bound is weaker than the second. We present it because its 
proof nicely illustrates the concepts involved. 


THEOREM 3.1. For every m = 1, there exists a system of 2” ( pairwise) 
divergent m-state Mealy Machines. 


Proor. The proof consists of establishing a system of 2” (pairwise) diver- 
gent m-state Mealy Machines. This is done by describing the maps f;, g; that 
are allowed. We only allow maps f; for which 


S{s)=s+1modm forall 1<s<m, alli. (3.10) 


Pictorially (see Figure 2) this means that the m states are arranged in a circle 
and, independent of the inputs, the state of the system moves around the circle. 
Further, we only allow maps g for which 


&(S) = 6,., (Kronecker delta), (3.11) 


and finally, we do not put any restrictions on the map g,(-). Since, for each s, 
there are two possible values of g,(s), this gives us exactly 2” machines. All 
we need to do is prove that these 2” machines are pairwise divergent. This will 
be done by contradiction. 

Suppose that two of these machines, say machine 1 and machine 2, are not 
‘‘divergent.’’ Then there exist states s”, j = 1, 2, such that if machine j starts 
in state s‘” and both machines get the same (arbitrary) input sequence, then 
both get the same output sequence. 


First, give both machines input sequence (0, 0,...) (zeros only). At time f, 
machine j is in state s? + ¢ mod m and, by (3.11), has output ô; s4 smod m 
Hence, s = s” = sọ. Next, choose any 7€{1,2,..., m} andlet d = T — So 


mod m (0 < d < m — 1). Give both machines as input d zeros followed by a 


502 ALON ET AL 


one. At time ź = d, both machines are in state 7 and since their outputs are the 
same we have g{?(7) = g(r). Since 7 was arbitrary, the two machines 
therefore have identical maps g,(-). This completes the proof. 


THEOREM 3.2. If m is prime, then there exists a system of (2m)”™ 
(2 — 2)/m ( pairwise) divergent m-state Mealy Machines. 


Proor. Our constraints on the maps f,, g, now are as follows: 


fals) = s+ imod, (3.12) 

&o(s), for s = 1,2,..., m, are not all the same, (3.13) 
J\(+): arbitrary, (3.14) 

g,(-): arbitrary. (3.15) 


The constraints (3.12)-(3.15) define a class of Mealy machines. Clearly, there 
are (2 m)”(2™ — 2) machines in this class. The factor m~! will be introduced, 
and the theorem will be proven, by showing that if any two machines in this 
class are not divergent then one is a rotation of the other, in the sense that there 
exists a d, 0 < d < m — 1, such that if the states of machine 2 are relabeled: 
state s is relabeled as state s+ d mod m, then after the relabeling the 
machines 1 and 2 have identical f,(-) and g,(-), see also Figure 3. 

Suppose two machines, say machines | and 2, from the class are not 
divergent. That means there are states s, s? such, that if machine j starts in 
state s“ and both machines receive the same, but arbitrary, input stream then 
their output streams are identical. By doing a rotation or relabeling of the 
nodes, this time for both machines, we can ensure that s‘? = s@ = 1. 

First, give both machines input stream (0, 0, ...) (zeros only). Both 
machines, at time ź, are in state 1 + ¢ mod m and the outputs at time ¢ are 
g3’(1 + £ mod m) and g(1 + t mod m), which must be equal. Hence: 


eP (s) =22(s)= gs) for alls. (3.16) 


Next, give as input s — 1 zeros followed by a one. At time s — 1, both 
machines are in state s and the outputs are g{/ (s), which must be equal, so that 


PCs) = gP(s) = 2,(s) for alls. (3.17) 


Finally, choose any Sọ, 1 < Sọ < m and give as common input Sọ — 1 zeros, 

then a one, and then further only zeros. At time sọ — 1, both machines are in 

state Sy. At time Sọ, machine j is in state f(s), and at time sọ + ¢ (t = 0) 

machine J is in state ( fi” (So) + t) mod m. We need to prove that f’ (sọ) = 
1 (So). Let 


s, = fP (So) (3.18) 
and let 
d=s,—s,;modm, Osdsm-l. (3.19) 


It will turn out that d = 0. 
Because of the way fo is defined, 


& (5, + t mod m) = g(s, + d + t mod m) for all¢=0. (3.20) 
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In particular: 


8o(S;) = &o(s, + d mod m) = g(s, + 2d mod m) = +> (3.21) 
and in fact 


&o(5,) = o(s, + kdmod m) for all integer k. (3.22) 


Since m is prime, d # 0 in (3.22) implies that go(s) = Zo(s,) for all s, which 
contradicts (3.13). Hence, d = 0, that is, f{?(s9) = f(s), which completes 
the proof. 


4. Threshold Machines 


A K-cell Threshold machine has K + 1 threshold cells, numbered 0, 1,..., K, 
interconnected by lines of various weights as explained below. At each point in 
time ¢ = 0,1,..., each cell either does or does not fire. We use variables 
x(t) to describe this: 


1 cell 7 fires at time t, 

U= 0 cell i does not fire at time t, a) 
X(t) = I(t), the input at time ¢, and this variable is externally given: Cell zero 
is the input cell. For j = 1, the value of X(t) depends on (x;(t — 1B) ae 
through weights w; ,,0 <i< K, 1 <j <K, and thresholds 6, lsjskK. 
Many authors require those weights and thresholds to be integer. In this paper 
we do not put any restrictions on the weights and thresholds (apart from the fact 
that they must be real). The dynamics of the threshold machine is given by 


; K 
x(t +1) = fi f S i=0 x(t) w, = 0, , (4.2) 
0 else, 


hence, the names threshold machine and threshold logic. Clearly, every AND, 
OR, or NOT gate can be implemented by a single threshold cell (or threshold 
gate), while every threshold cell can be implemented using a (possibly large) 
number of AND, OR, and NOT gates. Hence, all the negative results in this 
paper translate into negative results about the power of machines based on 
AND, OR, and NOT gates. The positive (constructive) results do not necessar- 
ily carry over in the same way. Outputs are given by 


O(t) = xx (2), (4.3) 
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i.e., cell K is chosen to be the output neuron or output cell. This definition 
shows that the input cell is not really a threshold cell: There are no inputs into 
this cell and no threshold associated with this cell. 

With this description, it is clear that every threshold K-machine indeed is a 
2*-state Mealy Machine. It is easily seen (see Section 5) that on the other hand 
every m-state Mealy Machine can be built as a threshold machine with 2m + 1 
cells. 

A threshold machine is entirely described by K and its weights w, , and 
thresholds 6;. On the other hand, two different sets of weights and thresholds 
can easily generate the same machine. Suppose we have the machines | and 2 
with the same numbers of cells and with weights and thresholds w!”, 0%, 
v= 1, 2. We define these machines to be the same if for every je 
{1,2,..., K}, for every sequence (ôb, 5,,...,5,) €{0, I}**!, 


K K 
2 bw) = 0 e 2. ôw® > 0P. (4.4) 
Clearly, two machines are the same under this definition if and only if they 
have the property that for every subset S, of {1,2,..., K} and for every input 
sequence (J(f))_9, if for both machines at time ¢ = 0 exactly the neurons in 
So fire, and both machines receive the same input sequence (J(t))%_,, then at 
every point in time ¢ = 0 the two machines have exactly the same set S, of 
firing neurons. Also, with this definition, ‘‘being the same” defines an equiva- 
lence relation on the set of all K-cell threshold machines. Machines not in the 
same equivalence class can still be ‘‘the same’’ in some weaker sense than 
defined above (see Section 3). However, the total number of equivalence 
classes certainly gives an upper bound for the number of in any reasonable 
sense different K-cell machines. We have the following result: 


Lemma 4.1. The number of equivalence classes in the relation above is 
less than 2) +V K+K. 


Proor. We choose a specific j, 1 < j < K, and study the number of ways 
we can choose (w, ,), and @,. Any hyperplane 


K 
2, x,W, = 6 (4.5) 
= 


generates two such ways: Wo,...,Wg; 0 and —Wo,...,—We; —0. Two 
such hyperplanes are different (in the sense of the definition above) if they split 
the 2**' points in {0, 1}**! in two different ways into two subsets. A result 
by Harding [2] shows that a set of n points in R? can be divided by a 
hyperplane into two subsets in at most 


T ') (4.6) 


different ways. For N = 1, fixed, we easily prove by induction to L that 
woe) <2". Hence, for j fixed, we can choose (w, 0,) in at most 


K+1 7 
2 D Pe | < (K+ 41 (4.7) 
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effectively different ways. Since we can do this for j= 1, 2,..., K, we 
immediately have that the total number of effectively different ways to choose 
the w, ;, 0, is at most (QRF DPF) K — (KHD K+K 

Each K-cell threshold machine is a 2% state Mealy machine which by Lemma 


3.1 implements at most 2“ (pairwise) divergent machines. With Lemma 4.1, 
this leads to: 


THEOREM 4.1. The set of all Mealy machines that can be implemented by 


a K-cell threshold machine can not contain more than 2‘**”’ pairwise 
divergent machines. 


Namely, in the language of Section 2, we have 


U(K) eee . 22K < KHY, (4.8) 
Combining, as promised eq. (4.8) with Theorem 3.1, we get 
2+? > U(K(m)) = L(m) = 2”, (4.9) 
or 
K(m)> m'? — 1. (4.10) 


This result is not quite as good as promised in Theorem 1.2. To prove Theorem 
1.2, we must combine Theorem 4.1 with Theorem 3.2. Let p(m) be the largest 
prime < m. Clearly, any Mealy machine with p(m) or fewer states can be 
implemented as a threshold machine with K(m) or less cells. Hence 


2 Kim +1 > U(K(m)) = L(p(m)) 
Qr(m) _ 9 


> (2p(m))”” =e 


(4.11) 


or for m sufficiently large: 
(K(m) + 1} log2 > (p(m) — 1)(log4 + log p(m)), 


log p(m) ue 
K(m) = (atm 1) - +2 — 1. (4.12) 


log 2 


Since it is well known (see, e.g., Hardy and Wright [3]) that 


lim —— =1, (4.13) 


this proves Theorem 1.2. 

Theorem 4.1 puts no restriction on the set of weights that are allowed, the set 
of thresholds that are allowed, or the Fan-in or Fan-out of cells. This means 
that it is very hard, probably impossible, to ‘‘mass produce’’ one standard 
generic threshold cell that then can be used to build all desired threshold 
machines. 

Next, we investigate the consequences of being restricted to a prespecified set 
of weights, independent of the number of cells or the number of states in the 
machine. 
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THEOREM 4.2. Consider a prespecified set W = {w,,...,w,} and con- 
sider the family of threshold machines that have weights from W only. We 
call W the weight alphabet of this family of machines. On this family, the 
function U satisfies: 


UK) =| wiht. (2841 41) "25. (4.14) 
By combining this with Theorem 3.2, we get exactly as in (4.11)-(4.13): 
K(m) = C,(mlogm)'”. (4.15) 


Proor. We only need to prove (4.14). In constructing the threshold ma- 
chine, for each of the K(K + 1) weights there are | W | oe choices, this 
gives the factor |W|*4**). For each j, X, ôw,, (êo 5,.--, dn) € 
{0, 1}4*') has at most g(K+) different values so that there are at most 
2**! + 1 effectively different values for 0,. This gives the factor (2“*! + 1)“. 
Finally, the factor 2* comes from Lemma 3.1. 

A further practical restriction on threshold machines is the Fan-in and the 
Fan-out of cells (see, e.g., Savage [12]). The Fan-in of cell j is defined as 


[{i: w,, # O}], (4.16) 
and the Fan-out of cell į is defined as 
|{i:w,,, #0}. (4.17) 


The following two results show that a constraint on the fan-in of cells is much 
more restrictive than a constraint on the fan-out of cells: 


THEOREM 4.3. Suppose there exists a prespecified number F and cells 
i= 1 are only allowed to have Fan-out < F (the input cell is allowed 
unlimited fan-out). Then: 


K 
U(K) < (x) pO RSL J2K (4.18) 


and there exists a C;>0 such that for every sufficiently large m there 
exists an m-state machine which (with the restriction on Fan-out above) in 
order to be built as a threshold machine needs 


at least C,(mlogm)'’” threshold cells. (4.19) 


Proor. For each cell i > 1, choose F cells j for which w, ; is allowed to 
be nonzero. This gives the factor Oe. Given these choices, let A, be the 
maximal possible Fan-in of cell j. Clearly, 


K 
A,=0, 2 A;= KF +K=K(F +1), (4.20) 
= 
so that 
K 2 
X 4s K7(F+ 1). (4.21) 


J=1 
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By the same argument as in the proof of Lemma 4.1, we see that for any j the 
weights w; ; and the threshold ¢, can be chosen in less than 241+! different 
ways. This, with (4.21), gives the factor 2°(F+)’+K and Lemma 3.1 adds 
another factor 2%. This completes the proof of (4.18). The proof now is 
completed by combining (4.18) with Theorem 3.2, using the fact that Ce < 
(F). K. O 

The following two results show that under certain circumstances the number 
of cells needed to build an m-state machine may grow linearly in m. 


TueorEM 4.4. Suppose there exists a prespecified number F, and only 
threshold machines where all the cells have a Fan-in < F are allowed. Then 


U(K) < toe Iaea aaa (4.22) 


and there exists a C; > 0 such that for every sufficiently large m there 
exists an m-state machine that (with the restriction on Fan-in above) in 
order to be built as a threshold machine needs 


at least C; © m threshold cells. (4.23) 


Proof. First, we prove (4.22). For any cell j = 1, choose a set of F cells i 
for which w, „j 18 allowed to be nonzero. This gives the factor (=: +1), For that j, 
the same argument as used in Lemma 4.1 shows that 0, and the nonzero w, , 
can be chosen in at most 27°*! effectively different ways. We can do this for 
every j (this gives the power K). The factor 2* follows from Lemma 3.1. This 
proves (4.22). The bound (4.23) is obtained by combining (4.22) with Theorem 
3.2. O 


THEOREM 4.5. Suppose we only allow threshold machines that satisfy the 
following three restrictions: 


(i) Each celli (1 < i < K) has Fan-out < F (the input cell has unlimited 
Fan-out). 
(ii) There exists a prespecified finite set T, and all thresholds 0, must be 
from that set (T is the threshold alphabet that is allowed). 
(iii) There exists a prespecified set W = {0, w,, W,,...,w,}, and all 
weights w, ; must be from that set. Then: 


U(K) < ((K)iw i") wrez (4.24) 


and there exists a C, >0 such, that for every sufficiently large m there 
exists an m-state machine that (with the restrictions above) in order to be 
built as a threshold machine needs 


at least C, - m threshold cells. (4.25) 


Proor. For each cell i = 1, choose F cells j for which w,, is allowed to be 
zero. This gives the factor (9%. Each of these w,; can have |W | different 
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values, this gives the factor |W |’*. The factor | W |* is due to the freedom 
in choosing Wo. ¿Q SJ =k), the factor T* is due to the freedom in choosing 
the thresholds 0 |,» and the factor 2* follows from Lemma 3.1. This proves 
(4.24). The bound (4.25) is obtained by combining (4.24) with Theorem 3.2. 

It is somewhat surprising that a restriction on the Fan-in of cells leads to a 
strong increase in K(m), while a similar restriction on the Fan-out needs 
additional conditions before it leads to the same result. In an attempt to explain 
this asymmetry, we define the Fan-in and Fan-out of a state in a Mealy 
Machine: 

The Fan-in of state s is the number of pairs (i, s’) with f,(s’) = s. 

The Fan-out of state s is the number of states in { f(s): i = 0, 1}. 

Clearly, each state has Fan-out either 1 or 2, while the Fan-in can be 
anywhere between zero and 2m (both included). This observation may explain 
the asymmetry in the powers of restrictions on the Fan-in and Fan-out. An 
interesting question therefore is: Given that all states of an m-state Mealy 
machine have Fan-in < F, how many cells may be needed to build it as a 
threshold machine? An interesting related question is: How many (pairwise) 
divergent m-state Mealy machines can be found, if we allow only machines for 
which all states have Fan-in < F? 

In Theorem 1.4, which will be proved in the next section, we promised a 
construction that builds every m-state Mealy machine using exactly 2m + 1 
cells. In light of the question raised above, it is interesting to observe that in 
that construction, for every state s with Fan-in F,, the construction uses two 
cells (s, 0) and (s, 1), which each have Fan-in equal to F, + 1. In addition, 
there is an output cell of which the Fan-in may be as high as 2m. 


5. Constructing Efficient Neural Nets 


The fewer neurons that we are allowed in constructing a network that simulates 
an m-state Mealy machine, the more difficult the job becomes. For a Mealy 
machine on m states, there is no problem constructing a simulating network 
that has 2m + 2 neurons. This includes the input neuron and, in the language 
of Section 4, it is therefore a 2m + 1 cell machine. 

In this construction, each state is represented by two neurons. State s 
(1 = s < m) is represented by the neurons (s, 0) and (s, 1) and we build the 
threshold machine in such a way that at time ¢ neuron (s, i) fires if and only if 
the original Mealy machine at time ż is in state s and receives input 7. The 
construction is illustrated in Figure 4. In addition to the input neuron and the 
2m neurons obtained as above there is a (2m + 1)th neuron called the output 
neuron.. 

First, we describe the weights. For any two states x and s, Weencs.0) = 
We. ¢5,1) € 0, 1}, and this equals 1 if and only if in the original finite state 
machine state x with input / leads to the next state s (in the language of Section 


3: f,(x) = s). For inputs, Wos = —1 and Wos.) = +1 for all s. For 
outputs, We) 2m+1 = 8S) (8,5) as in Section 3). All weights not yet 
mentioned are equal to zero. The thresholds are defined by 4., ,, = 2, fso = 1, 
and @,,,4, = 1. 


It is clear that if at time ¢ = 0 exactly one of the neurons ( S,ij,1<s<m, 
0 <i < 1, fires then at any time ¢ = 1 exactly 1 of those neurons will fire, and 
the dynamics is exactly the same as that for the Mealy machine. It is not clear 
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whether, if by accident multiple neurons fire simultaneously, we ever reach the 
situation where exactly one neuron fires. The outputs of the threshold machine 
just constructed lag one time unit behind the outputs of the Mealy machine. 

Of the 2°” states defined by the neurons (s, i), only 2m are used to 
implement the m-state machine. This enormous redundancy suggests that it 
must be possible to implement the machine using much fewer than 2m neurons. 
Indeed, we know that some m state machines can be built using only log m 
neurons, and Theorem 1.1 shows that there always is a significant improvement 
available over the number 2m + 1. However, the Theorems 1.2, 1.3, 4.2, 4.3, 
4.4, and 4.5 also show that often we can not get at all close to the ideal of log 
m, and in particular when there are limits on the nature of the threshold cells 
that can be used, the possible improvement over 2m + 1 may be quite limited. 

Even when there is no restriction on the type of threshold cell that can be 
used, it requires effort to do better than a linear number of neurons. Take, for 
example, the construction about to be given. It represents the best result so far 
in this direction: We begin with a Mealy machine M which has m states. 
There are no restrictions on the structure of M. We show that any such Mealy 
machine can be implemented using at most O(m?) neurons. This construction 
uses neurons with two different functionalities: there are state-neurons and 
transition neurons. The simulating network will have 2k state-neurons where 
k = [ m'‘/?]. These state neurons fire in pairs, in accordance with the scheme 
laid out in Figure 5. 

The state neurons are labeled a,,..., ap; b,,..., bg, the a-set representing 
rows of an implicit matrix, the b-set representing columns. A given state q is 
simulated when one of the pairs (a;, b,) fires. In Figure 5, for example, the 
state q corresponds to the simultaneous firing of a, and b,. This is the easy 
part of the construction. Now we must make sure that whenever the Mealy 
machine is in a particular state the correct two state neurons fire. This we do by 
building, for each state neuron, a ‘‘black box” of transition neurons that 
make sure that the state neuron fires if and only if the Mealy Machine is in one 
of the states in the row or column of the state neuron. The threshold machine 
thus becomes periodic. By periodic, we mean that the threshold machine 
alternates between two phases: a transition phase, during which only transition 
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neurons fire, followed by a state phase, during which only state neurons fire. 
The state phase is exactly one time unit long, the transition phase will be seen to 
be exactly two time units long. To simplify comparison between the Mealy 
Machine and the Threshold Machine, we allow cells to fire at time ¢ for t 
integer (at those times only state neurons can fire) and for ¢ = + mod 1 or t = 
2 mod 1 (at those times only transition neurons can fire). The transition neurons 
will be organized in two layers. We also introduce an output neuron that can 
fire at time ¢ integer only, and outputs of the threshold machine lag behind 
outputs of the Mealy Machine by one time unit. We now describe the 
construction of the black boxes. In fact, there are two black boxes for each state 
neuron: One to be used if the previous input was zero, the other if the previous 
input was one. To simplify the language, we only describe the black box for a 
“‘row’”” state neuron (say state neuron a,), to be used when the previous input 
was zero. Let S(x, 0) be the set of states with the property that if at time ¢ — 1 
the Mealy machine is in one of those states and the input is zero, then at time ¢ 
the Machine is in one of the states in the row of x. S(x, 0) is a set of states and 
hence k x k incidence matrix, or a set of pairs (a,, b,). The black box must 
have the property that at time ¢ neuron a, fires if and only if at time ¢ — 1 for 
some pair (@,, b;) in S(x, 0) both a, and b, fired. The black box is a network 
that recognizes whether a pair in S(x, 0) just fired and is accordingly called the 
recognition network for cell a,., for input zero. 

We now discuss some easily recognizable special incidence matrices. A set U 
of pairs (a, b,) is called simple if there exist nonnegative vectors 
(A,,..., A,) and (B,,..., B,), and a positive threshold T, with the prop- 
erty that 


(@,,b,)€U —ifandonlyif A,+B, =T. (5.1) 


As long as exactly one row state neuron a, and exactly one column state 
neuron b, fire, any simple set U can be recognized by a single neuron or cell 
Cy, with weights and threshold 


W u=A, Wyy=B, Oy=T. (5.2) 


This cell recognizes set U by firing if and only if a state in U just occurred. 
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A set U of pairs (a,, b,), i.e., a k X k incidence matrix U, is called a 
‘‘Northwest corner’’ matrix if there exist indices a,, 


K2 0, 20,2 e > a, = 0, (5.3) 
with 
U,,=1  ifandonlyif j < a;. (5.4) 
Equivalently, we could have required that there exist indices 6,, 
k= b= 2, = Se >= 20, (5.5) 
with 
U,;=1 if and only if i< $£, (5.6) 


(with œ; = max{ j: U, ; = 1} and 6, = max{i: U,, = 1}). Similarly, we could 
define ‘‘NE’’, ‘‘SW’’, and ‘‘SE’’ corner matrices. Instead, we define a k x k 
incidence matrix U to be a ‘‘proto corner-matrix’’ if there exist permutations ø 
and 7 of {1, 2,..., k} such that Uzo, 18 a (NW) corner matrix. 

It is easily verified that every k x k NW corner matrix is wae and that 
the weights A,, B, and the threshold T all can be chosen in {0,1,..., k} and 
such that 


A, ZA, > e 2A,20, B, > B, > +- =B,=0. (5.7) 


Hence, also every proto corner matrix is simple. In fact, it is easily seen that an 
incidence matrix is simple if and only if it is a proto corner matrix. (Once 
A,,..., A, and B,,..., B, are known, use permutations ø and 7 such that 
(5.7) ‘holds "iar the reordered matrix.) A set Q of pairs (a;, b,) is called 
semi-simple if it is the intersection of two simple sets U and V. Clearly, any 
semi-simple set Q can be recognized using three neurons or cells: Cells Cy 
and Cy, as above, and a cell Cy with weights and threshold 


W o=Wygo=l, %=2, (5.8) 


so that cell Co “ANDs” the cells Cy and Cy. Cell Co fires if and only if two 
time units ago a pair (a;, b,) in Q fired simultaneously. Now suppose we can 
write 


S(x,0) = U R (5.9) 


where S,, S,,..., Sz (which need not be disjoint) are semi-simple in the sense 
above. In that case, we can recognize S(x, 0) using 3Z neurons. Of these, the 


2L “V” and “U?” neurons can fire at time £ = į mod 1 only, and the L 
neurons for S,, S,,..., S, can fire at time ¢ = 2 mod 1 only. Between the S, 
neurons and neuron @, we have weights and threshold 

Ws ai = L 6, = 1, (5.10) 
so that neuron a, ‘‘ORs’’ the neurons Ccy,...,Cs. By this construction, 


neuron 4&, fires at time ¢ if and only if at time ¢ — 1 a pair (a,, b,) in S(x, 0) 
both fired. In order to complete the proof of Theorem 1.1, we need an upper 
bound for the number of semi-simple sets (or semi-simple incidence matrices) 
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needed to cover a set (or incidence matrix) S. As in Dewdney [1], we define a 
line matrix to be an incidence matrix that either is a subset of some row (all 
ones are in that row), or is a subset of some column (all ones are in that 
column) or is a transversal, i.e., no two or more ones in any row or column. It 
is easily seen that every line matrix is either simple or semi-simple: Any 
incidence matrix that is a subset of some row or column is simple, while every 
transversal is the intersection of two proto corner matrices and therefore is 
semi-simple. Dewdney [1] proved that any set S of pairs (a;, b,) can be 
covered by a set of L line matrices, with 


bs (i8) 1% (5.11) 


The proof is based on ‘“‘greedily’’ covering S by a sequence of line matrices 
B,, B,, ... and applying Konig’s theorem (Konig [6]) to the matrices B, 
individually, see Dewdney [1] for details. Equation (5.11) gives an upper bound 
for the number of line matrices that may be needed to cover S, and therefore an 
upper bound for the number of semi-simple matrices that may be so needed. 
The remainder of the proof shows that any improvement in (5.11) produces an 
improvement in the final result. It seems likely that an improvement is indeed 
possible by using more general intersections of pairs of proto corner matrices 
than just line matrices. 

Let n, = | S(x.0)|. Since each state q is in the set S(x, 0) for exactly one 
a,., we have 


Xn =m, Osn sm forall x. (5.12) 


This shows that 


k 
È [va] =< ce kee 
ee = [vm] + vm: [Vm ]!? — m^. (5.13) 


The total number of transition neurons in the black boxes for the row state 
neurons, for input zero, is at most of the order 3 - m°“. The same argument 
holds for transition neurons for column state neurons, and for input 1. We still 
must make sure that the “V” and “U?” neurons in the black box to be used 
with input zero do not fire when the input is one (and vice versa). This is done 
in the same way as in the construction of the 2m + 1 cell machine, earlier in 
this section. Finally, for the output neuron, we also build two black boxes; one 
for input zero, one for input one. Each of these black boxes contains at most 3k 
neurons. In the construction above, the a; and b, neurons can have Fan-out at 
least as high as 2k (probably more if the sets U and V for different transversals 
are not disjoint), and that the “‘U’’ and ‘‘V’? cells can have Fan-in as high as 
2k. From Dewdney [1] or the argument preceding (5.7) above it is clear that 
the weights A,, B, take values in {0.1,...,} and that the thresholds T for 
the U“ and “V” cells take values in {1,2,..., k}. The construction above 
clearly, and not surprisingly. violates the conditions of Theorems 1.3, 4.2, 4.3, 
4.4, and 4.5. 

A final note concerns possible improvements to this construction. It would 
seem that one may build recognition networks (to drive each state neuron) that 
are far more efficient than the ones described here. 
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Can one, for example, ‘‘recognize’’ an arbitrary pattern of n 1’s by using no 
more that O(log m) neurons in the recognition network? If so, the overall 
complexity for a network simulating an m-state Mealy machine would drop to 
O(m' log m). As often happens in the closely related field of logic optimiza- 
tion (see, for example, Savage [11]), there may be a trade-off between depth 
and overall complexity: In other words, one may have to accept recognition 
subnetworks that have a nonconstant depth, which slowly increases as m gets 
large. 

Another possibility involves a construction that uses a 3-dimensional state 
matrix governed by 3k neurons, where 


= [Vm]. (5.14) 


But, according to Theorem 1.2, we can hardly expect to do better than this 
since it implies a construction having 


O(m'log m) 


neurons, quite close to the lower bound of c, ° (m log m)'? neurons. 
It is intriguing, nevertheless, that we apparently cannot usefully employ a 
construction that implies a dimension higher than three! 


6. Conclusion 


The results in this paper amount to a limitation on the power of neural nets. In 
spite of the current climate of optimism, and in spite of successes in limited 
categories of computation, one cannot look to neural networks as the parallel 
panacea of the future. 

The limitations demonstrated here for the classic McCullough-—Pitts networks 
apply with some force, obviously, to the case of connectionist mission. For 
example, a Hopfield-style network (see Hopfield and Tank, [5]), having as its 
goal a solution of the traveling salesman problem amounts to a finite automaton 
in which all states (hopefully) lead to one or just a few states. 

We saw that while some m-state machines can be built using very few (in the 
order of log m) neurons, some need many more (namely, K(m)) neurons to be 
built. A naturally arising question is whether there are any interesting classes of 
finite automata that can be efficiently built as neural nets; namely, with a 
number of neurons that is very small compared with the size of the automata. 
Preliminary research by two of the authors of this paper indicates that while 
there are classes of finite state machines that can be efficiently simulated in this 
sense, all really interesting classes considered so far are too rich in structure 
and size. This research is continuing. 

In addition, there is a large number of technical questions about the bounds 
presented in this paper. For example, we suspect (see Theorem 4.4) that a 
limitation on the Fan-in of neurons may lead to m-state machines that can be 
simulated only by a superlinear (in m) number of neurons. 
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